NCHLT: isiNdebele POS tag set

Tag set

For purposes of annotators, this tag set is by and large taken over from Taljard et al. (2008) and various documents compiled by G. Faasz and U. Heid from the IMS, Stuttgart and D.J. Prinsloo and E. Taljard, University of Pretoria. The information below refers to the current state of the tagset, but further development will probably necessitate any number of changes.

The tagset is mainly based on the lexical and morphological criteria defined by Lombard (1985) and Louwrens (1991). The logical structure of the tagset is divided into two layers of linguistic description (annotation levels):

The first annotation level (level 1) includes all mandatory, or, according to EAGLES, obligatory information, namely up to three elements: an element hinting at the word class, a second one specifying functional or syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).

The second level of annotation (level 2) includes recommended and optional information. This level is in most cases used for a detailed description of closed class items described in the tagger lexicon. Compare the following excerpt:

 

Figure 1: Annotation levels

Description

Tag 1st level (mandatory information)

Tag 2nd level (optional/ recommended information)

Pronouns:

 

 

emphatic personal

PROEMPPERS

1sg,2sg,1pl,2pl

Verbals:

V

tr

Morphemes:

 

 

deficient

MORPH

def

 

For disjunctive languages, next to all orthographic words, all linguistic words will also be tagged, resulting in two layers of POS annotation: one for all orthographic words and one for all linguistic words. For conjunctive languages, this extra layer of POS annotation is not needed.

The tagset currently distinguishes 20 categories applicable to isiNdebele and two different levels of annotation. However, only level 1 has been annotated. The first part of the tag gives a general indication of the nature of the unit in question. These are as follows:

 

Tag

Explanation

PUNC

Punctuation

ABBR

Abbreviation (incl. acronyms)

ADJ

Adjective (incl. enumerative)

ADV

Adverb

CDEM

Class-indicating demonstrative

CONJ

Conjunction

COP

Copulative (copulative subject concord, demonstrative copulative, copulative verb)

FOR

Foreign

IDEO

Ideophone

INT

Interjection

INTER

Question word

N

Noun

NPP

Place and brand name

NUM

Numerative

POSS

Possessive (possessive concord, possessive pronoun)

PROEMP

Emphatic pronoun

PROQUANT

Quantitative pronoun

REL

Relative

V

Verbal

VAUX

Auxiliary verb

 

 

 

 

Tags not applicable to IsiNdebele

ASP

Aspectual marker

AUX

Auxiliary stem

CN

Class-indicating nominal prefix

CO

Class-indicating object concord

CS

Class-indicating subject concord

MNEG

Negative morpheme

PART

Particle

TENS

Tense marker

 


PUNCTUATION

Level 1: PUNC

Notes:

Examples:

;

PUNC

(

PUNC

!

PUNC

PUNC

 

ABBREVIATION

Level 1: ABBR

Notes:

Examples:

isib

ABBR

NGO

ABBR

 

ADJECTIVE

Level 1: ADJ01-11, ADJ 14-15, ADJ01a, ADJ02a, ADJLOC

Notes:

Examples:

omunye

ADJ01

elikhulu

ADJ05

komunye

ADJLOC

ADVERB

Level 1: ADV, ADVLOC

Notes:

Examples:

kanye

ADV

phambili

ADV

engaphasi

ADVLOC

 

 [CLASS-INDICATING] DEMONSTRATIVE

Level 1: CDEM01-11, CDEM14-15, CDEMLOC

Notes:

Examples:

labo

CDEM02

loyo

CDEM03

lapho

CDEMLOC

 

CONJUNCTION

Level 1: CONJ

Notes:

Examples:

namkha

CONJ

ukuba

CONJ

 

COPULATIVE

Level 1: COP

Level 2: COP_neg, COP_nil

Notes:

(-be, - and –bilê). For the copulative verb stem –se  the tag COP_neg on level 2 is used, as is the case for the verb stem –be (<-ba) when it is used in the negative form.

Examples:

yiKomidi

COP

kube

COP

 

FOREIGN

Level 1: FOR

Notes:

Examples:

provincial

FOR

systems

FOR

 

IDEOPHONE

Level 1: IDEO

Examples:

godu

IDEO

yeke

IDEO

 

INTERJECTION

Level 1: INT

Level 2: INT_neg, INT_nil

Notes:

Examples:

na

INT

nekomo

INT

 

INTERROGATIVES

Level 1: INTER

Level 2: _man, _time, _loc, _N01a, _N02a

Notes:

Examples:

na

INTER

bunjani

INTER

mangaki

INTER

NOUN

Level 1: N01-11, N14-15, N01a, N02a, NLOC, N00

Level 2: _aug, _dim, _loc, _name, _nil

Notes:

Examples:

umuntu

N01

abomma

N02a

imiphumela

N04

iphrojekthi

N05

amalunga

N06

isiqhema

N07

indawo

N09

mayelana

N00

emsebenzini

NLOC

 

PLACE AND BRAND NAME

Level 1: NPP

Level 2: NPP_place, NPP_brand

Notes:

Examples:

KwaZulu-Natal

NPP

Mars

NPP

 

NUMERATIVE

Level 1: NUM

Notes:

Examples:

2.2

NUM

2005

NUM

74(a)

NUM

 

POSSESSIVE

Level 1: POSS01-11, POSS14-15, POSSLOC, POSSPERS, POSSKA

Level 2: POSSPERS_1pl, POSSPERS_2pl

Notes:

Examples:

wephrojekhti

POSS01

sokutlama

POSS07

kamasipala

POSSKA

 

EMPHATIC PRONOUN

Level 1: PROEMP01-11, PROEMP14-15, PROEMPLOC, PROEMPPERS

Level 2: PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl

Notes:

Examples:

bona

PROEMP02

kizo

PROEMPLOC

khona

PROEMP15

 

QUANTITATIVE PRONOUN

Level 1: PROQUANT01-11, PROQUANT14-15, PROQUANTLOC

Notes:

Examples:

boke

PROQUANT02

zoke

PROQUANT10

koke

PROQUANT15

 

RELATIVE

Level 1: REL

Notes:

 

Examples:

angeze

REL

elibanzi

REL

esingaba

REL

 

VERBAL

Level 1: V

Level 2: V_tr, V_itr, V_dtr

Notes:

Examples:

babe

V

ukwakha

V

inikelwe

V

 

AUXILIARY VERB

Level 1: VAUX

Level 2: VAUX_tr, VAUX-itr, VAUX_dtr

Notes:

Examples:

ibe

VAUX

ukungabi

VAUX